Parallel Prefix Scan with Compute Unified Device Architecture (cuda)
نویسنده
چکیده
Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-bystep procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm and proceeds through more advanced techniques to obtain best performance. KeywordsScan, Parallel prefix sum, Prefix scan, CUDA, Parallel algorithms, Naïve algorithm
منابع مشابه
Parallel Compact Genetic Algorithm on CUDA-C Platform
This paper deals about the parallel implementation of the compact Genetic Algorithm on the Compute Unified Device Architecture (CUDA) platform of GPU. We elaborate implementation details on the parallel platform.
متن کاملParallel Optimized Algorithm for Apriori Association Rule Mining on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently .Now GPU(Graphics Processor Unit) has taken a major role in high performance computing for general purpose applications. Compute Unified Device Architecture (CUDA) programm...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملImage Based Virtual Dimension Compute Unified Device Architecture of Parallel Processing Technology
There are a number of virtual dimension typical targets in hyperspectral image. Determining the virtual dimension is the first step in many applications of hyperspectral image. In view of the virtual dimension calculation method of having high time complexity problem, according to the calculation of highly parallel features, in this paper graphics processing unit (GPU) using the Compute Unified...
متن کاملParallel design of JPEG-LS encoder on graphics processing units
With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, ad...
متن کامل